Shrinkage Estimation for SAGE Data using a Mixture Dirichlet Prior

نویسندگان

  • Jeffrey S. Morris
  • Keith A. Baggerly
  • Kevin R. Coombes
چکیده

Serial Analysis of Gene Expression (SAGE) is a technique for estimating the gene expression profile of a biological sample. Any efficient inference in SAGE must be based upon efficient estimates of these gene expression profiles, which consist of the estimated relative abundances for each mRNA species present in the sample. The data from SAGE experiments are counts for each observed mRNA species, and can be modeled using a multinomial distribution with two characteristics: skewness in the distribution of relative abundances and small sample size relative to the dimension. As a result of these characteristics, a given SAGE sample will fail to capture a large number of expressed mRNA species present in the tissue. Standard empirical estimates of the relative abundances effectively ignore these missing, unobserved species, and consequently tend to also overestimate the abundance of the scarce observed species comprising a vast majority of the total. In this chapter, we review a new Bayesian procedure that yields improved estimates for the missing and scarce species without trading off much efficiency for the abundant species. The key to the procedure is the mixture Dirichlet prior, which stochastically partitions the mRNA species into abundant and scarce strata, with each stratum modeled with its own multivariate prior, a scalar multiple of a symmetric Dirichlet. Simulation studies demonstrate that the resulting shrinkage estimators have efficiency advantages over the MLE for SAGE scenarios simulated.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Bayesian shrinkage estimation of the relative abundance of mRNA transcripts using SAGE.

Serial analysis of gene expression (SAGE) is a technology for quantifying gene expression in biological tissue that yields count data that can be modeled by a multinomial distribution with two characteristics: skewness in the relative frequencies and small sample size relative to the dimension. As a result of these characteristics, a given SAGE sample may fail to capture a large number of expre...

متن کامل

Classic and Bayes Shrinkage Estimation in Rayleigh Distribution Using a Point Guess Based on Censored Data

Introduction      In classical methods of statistics, the parameter of interest is estimated based on a random sample using natural estimators such as maximum likelihood or unbiased estimators (sample information). In practice,  the researcher has a prior information about the parameter in the form of a point guess value. Information in the guess value is called as nonsample information. Thomp...

متن کامل

Positive-Shrinkage and Pretest Estimation in Multiple Regression: A Monte Carlo Study with Applications

Consider a problem of predicting a response variable using a set of covariates in a linear regression model. If it is a priori known or suspected that a subset of the covariates do not significantly contribute to the overall fit of the model, a restricted model that excludes these covariates, may be sufficient. If, on the other hand, the subset provides useful information, shrinkage meth...

متن کامل

Predictive performance of Dirichlet process shrinkage methods in linear regression

An obvious Bayesian nonparametric generalization of ridge regression assumes that coefficients are exchangeable, from a prior distribution of unknown form, which is given a Dirichlet process prior with a normal base measure. The purpose of this paper is to explore predictive performance of this generalization, which does not seem to have received any detailed attention, despite related applicat...

متن کامل

Dirichlet Process Mixtures of Beta Distributions, with Applications to Density and Intensity Estimation

We propose a class of Bayesian nonparametric mixture models with a Beta distribution providing the mixture kernel and a Dirichlet process prior assigned to the mixing distribution. Motivating applications include density estimation on bounded domains, and inference for non-homogeneous Poisson processes over time. We present the mixture model formulation, discuss prior specification, and develop...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013